With the growing popularity of short-form video sharing platforms such as\em{Instagram} and \em{Vine}, there has been an increasing need for techniquesthat automatically extract highlights from video. Whereas prior works haveapproached this problem with heuristic rules or supervised learning, we presentan unsupervised learning approach that takes advantage of the abundance ofuser-edited videos on social media websites such as YouTube. Based on the ideathat the most significant sub-events within a video class are commonly presentamong edited videos while less interesting ones appear less frequently, weidentify the significant sub-events via a robust recurrent auto-encoder trainedon a collection of user-edited videos queried for each particular class ofinterest. The auto-encoder is trained using a proposed shrinking exponentialloss function that makes it robust to noise in the web-crawled training data,and is configured with bidirectional long short term memory(LSTM)~\cite{LSTM:97} cells to better model the temporal structure of highlightsegments. Different from supervised techniques, our method can infer highlightsusing only a set of downloaded edited videos, without also needing theirpre-edited counterparts which are rarely available online. Extensiveexperiments indicate the promise of our proposed solution in this challengingunsupervised settin
展开▼
机译:随着\ em {Instagram}和\ em {Vine}之类的短视频共享平台的日益普及,人们对自动从视频中提取亮点的技术的需求日益增长。以前的作品通过启发式规则或监督学习解决了这个问题,但我们提出了一种无监督的学习方法,该方法利用了YouTube等社交媒体网站上用户编辑的视频的丰富性。基于这样的想法,视频类中最重要的子事件通常是在已编辑的视频中出现,而有趣程度较低的子事件则不那么频繁地出现,因此,我们通过对一系列针对用户查询的用户编辑视频进行训练的健壮的循环自动编码器,来识别重要的子事件。每个特定的兴趣类别。自动编码器使用拟议的收缩指数损失函数进行训练,该函数使其对网络爬行的训练数据中的噪声具有鲁棒性,并配置了双向长期短期记忆(LSTM)〜\ cite {LSTM:97}单元以进行更好的建模高亮段的时间结构。与有监督的技术不同,我们的方法可以仅使用一组下载的编辑视频来推断精彩片段,而无需使用在线上鲜有的预编辑视频。大量的实验表明,我们提出的解决方案有望在这一具有挑战性的无监督环境中实现
展开▼